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Abstract 

In this paper, we develop a robust uncertainty principle for finite signals in C N 
which states that for almost all choices T, fi C {0, . . . , N — 1} such that 

|T| + |fi|x(logA0- 1/2 -7V, 

there is no signal / supported on T whose discrete Fourier transform / is supported on 
17. In fact, we can make the above uncertainty principle quantitative in the sense that 
if / is supported on T, then only a small percentage of the energy (less than half, say) 
of / is concentrated on ft. 

As an application of this robust uncertainty principle (QRUP), we consider the 
problem of decomposing a signal into a sparse superposition of spikes and complex 
sinusoids 

/oo = 5>i(i)*o>-t) + J2 ^{uy^x/VN. 

We show that if a generic signal / has a decomposition (0.1,0.2) using spike and fre- 
quency locations in T and ft respectively, and obeying 

\T\ + \Q\ < Const • (log iV)~ 1/2 • N, 

then (ai,a 2 ) is the unique sparsest possible decomposition (all other decompositions 
have more non-zero terms). In addition, if 

|T| + |fi| < Const- (log N)- 1 -N, 

then the sparsest (a±, 02) can be found by solving a convex optimization problem. 

Underlying our results is a new probabilistic approach which insists on finding the 
correct uncertainty relation or the optimally sparse solution for nearly all subsets but 
not necessarily all of them, and allows to considerably sharpen previously known results 
[9, 10]. In fact, we show that the fraction of sets (T, ft) for which the above properties 
do not hold can be upper bounded by quantities like N~ a for large values of a. 

The QRUP (and the application to sparse approximation) can be extended to 
general pairs of orthogonal bases <E»! , <J> 2 of C . For almost all choices Fi,^ C 
{0, . . . , N - 1} obeying 

|r 1 | + |r 2 |x Ai ($ 1 ,$ 2 )- 2 .(io g iv)- m , 

there is no signal / such that $1/ is supported on Ti and $2/ is supported on T 2 where 
/x(<I>i, $2) is the mutual incoherence between $1 and $ 2 - 
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1 Introduction 

1.1 Uncertainty principles 

The classical Weyl-Heisenberg uncertainty principle states that a continuous-time signal 
cannot be simultaneously well-localized in both time and frequency. Loosely speaking, this 
principle says that if most of the energy of a signal / is concentrated near a time-interval of 
length At and most of its energy in the frequency domain is concentrated near an interval 
of length Aw, then 

At • Alo > 1. 

This principle is one of the major intellectual achievements of the 20th century and since 
then, much work has been concerned with extending such uncertainty relations to other 
setups, namely, by investigating to what extent it is possible to concentrate a function / 
and its Fourier transform /, relaxing the assumption that / and / be concentrated near 
intervals as in the work of Landau, Pollack and Slepian [16, 17, 20], or by considering signals 
supported on a discrete set [9, 21]. 

Because our paper is concerned with finite signals, we now turn our attention to "discrete 
uncertainty relations" and begin by recalling the definition of the discrete Fourier transform 

N-l 



/H = -7W E fit)e-*™ t/N , (1.1) 



where the frequency index uj ranges over the set {0,1, . . . , N — 1}. For signals of length 
N, [9] introduced a sharp uncertainty principle which simply states that the supports of a 
signal / in the time and frequency domains must obey 

I supp f\ + | supp /I > 2ViV. (1.2) 

We emphasize that there are no other restriction on the organization of the supports of / 
and / other than the size constraint (1.2). [9] also observed that the uncertainty relation 
(1.2) is tight in the sense that equality is achieved for certain special signals. For example, 
consider as in [9, 10] the Dirac comb signal: we suppose that the sample size N is a perfect 
square and let / be equal to 1 at multiples of V~N and everywhere else 

/(t ) = / 1 ' * = WiV, m = 0,l,...,VJV-l 
I 0, elsewhere. 
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Remarkably, the Dirac comb is invariant through the Fourier transform, i.e. / = /, and 
therefore, | supp/| + | supp/| = 2y/N. In other words, (1.2) holds with equality. 

In recent years, uncertainty relations have become very popular, in part because they help 
explaining some miraculous properties of ^i-minimization procedures as we will see below, 
and researchers have naturally developed similar uncertainty relations between pairs of 
bases other than the canonical basis and its conjugate. We single out the work of Elad 
and Bruckstein [12] which introduces a generalized uncertainty principle for pairs <&i, <&2 of 
orthonormal bases. Define the mutual incoherence [10, 12, 15] between $1 and $2 as 

M($i,$2) = x max \((f),ip)\; (1.4) 

then if ot\ is the (unique) representation of / in basis $1 with Ti = suppcti, and Q2 is the 
representation in $2, the supports must obey 

ITii + |r 2 | > — — -. (1.5) 
/i($l,$ 2 ) 



Note that the mutual incoherence fi always obeys 1/v N < /i < 1 and measures how the two 
bases look alike. The smaller the incoherence, the stronger the uncertainty relation. To see 
how this generalizes the discrete uncertainty principle, observe that in the case where $1 is 
the canonical or spike basis and $2 is the Fourier basis, fj, = 1/y/N (maximal incoherence) 
and (1.5) is, of course, (1.2). 



1.2 The tightness of the uncertainty relation is fragile 

It is true that there exist signals that saturate the uncertainty relations but such signals 
are very special and are hardly representative of "generic" or "most" signals. Consider the 
Dirac comb for instance; here the locations and heights of the V^V spikes in the time domain 
carefully conspire to create an inordinate number of cancellations in the frequency domain. 
This will not be the case for sparsely supported signals in general. Simple numerical ex- 
periments confirm that signals with the same support as the Dirac comb but with different 
spike amplitudes almost always have Fourier transforms that are nonzero everywhere. In- 
deed, constructing pathological examples other than the Dirac comb requires mathematical 
wizardry. 

Moreover, if the signal length N is prime (making signals like the Dirac comb impossible 
to construct), the discrete uncertainty principle is sharpened to [25] 

|supp/| + |supp/| >N, (1.6) 

which validates our intuition about the exceptionality of signals such as the Dirac comb. 



1.3 Robust uncertainty principles 

Excluding these exceedingly rare and exceptional pairs T := supp/, := supp/, how tight 
is the uncertainty relation? That is, given two sets T and f2, how large need \T\ + \Q\ be 
so that it is possible to construct a signal whose time and frequency supports are T and 
f2 respectively? In this paper, we introduce a robust uncertainty principle (for general N) 
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which illustrates that for "most" sets T, fl, (1-6) is closer to the truth than (1.2). Suppose 
that we choose (T, fi) at random from all pairs obeying 

, , , , N 

t + n < 



V(/? + l)logiV 

Then with overwhelming high probability — in fact, exceeding 1 — 0(N~ l3p ) for some positive 
constant p (we shall give explicit values) — we will be unable to find a signal in supported 
on T in the time domain and £1 in the frequency domain. In other words, remove a negligible 
fraction of sets and 

N 

|supp/| + |supp/| > ^=====, (1.7) 

holds, not (1.2). 

Our uncertainty principle is not only robust in the sense that it holds for most sets, it is 
also quantitative. Consider a random pair (T, fi) as before and put In to be the indicator 
function of the set O. Then with essentially the same probability as above, we have 

||/-ln|| 2 <||/|| 2 A (1-8) 

say, for all functions / supported on T. By symmetry, the same inequality holds by ex- 
changing the role of T and CI, 

ll/-lr|| 2 < ||/|| 2 A 

for all functions / supported on fl. Moreover, as with the discrete uncertainty principle, 
the QRUP can be extended to arbitrary pairs of bases. 



1.4 Significance of uncertainty principles 

In the last three years or so, there has been a series of papers starting with [10] establishing 
a link between discrete uncertainty principles and sparse approximation [10, 11, 15,26]. In 
this field, the goal is to separate a signal / € into two (or more) components, each 
representing contributions from different phenomena. The idea is as follows: suppose we 
have two (or possibly many more) orthonormal bases <&i,<I>2; we search among all the 
decompositions (01,0:2) of the signal / 




for the shortest one 

(P ) min|H|^ , §a = f, (1.9) 

where ||o||^ is simply the size of the support of a, ||o||^ := |{7, 0(7) 7^ 0}|. 

The discrete uncertainty principles (1.2) and (1.5) are useful in the sense that they tell us 
when (Po) has a unique solution. When is the time-frequency dictionary, it is possible 
to show that if a signal / has a decomposition / = <£a consisting of spikes on subdomain 
T and frequencies on Q, and 

\T\ + \Vl\<^/N, (1.10) 

then a is the unique minimizer of (Po) [10]. In a nutshell, the reason is that if $(ao + <^o) 
were another decomposition, 5q would obey <Mo = which says that Sq would be of the 
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form Sq = (5, —5). Now (1.2) implies that 5q would have at least 2s/N nonzero entries 
which in turn would give ||«o + 8o\\e ^ v^V for all ao obeying ||aolk) < — thereby 
proving the claim. Note that again the condition (1.10) is sharp because of the extremal 
signal (1.3). Indeed, the Dirac comb may be expressed as a superposition of y/~N terms in 
the time or in the frequency domain; for this special signal, (Po) does not have a unique 
solution. 

In [12], the same line of reasoning is followed for general pairs of orthogonal bases, and 
^o-uniqueness is guaranteed when 

|ri| + |r 2 |< — ^— . (l.n) 

Unfortunately, as far as finding the sparsest decomposition, solving (P ) directly is compu- 
tationally infeasible because of the highly non-convex nature of the || • \\e norm. To the best 
of our knowledge, finding the minimizer obeying the constraints would require searching 
over all possible subsets of columns of <£, an algorithm that is combinatorial in nature and 
has exponential complexity Instead of solving (Po)) we consider a similar program in the 
l\ norm which goes by the name of Basis Pursuit [7] : 

(Pi) mining, $a = /. (1.12) 

a 

Unlike the £q norm, the i\ norm is convex. As a result, (Pi) can be solved efficiently 
using standard "off the shelf" optimization algorithms. The ^i-norm can also be viewed 
as a "sparsity norm" which among the vectors that meet the constraints, will favor those 
with a few large coefficients and many small coefficients over those where the coefficient 
magnitudes are approximately equal [7]. 

A beautiful result in [10] actually shows that if / has a sparse decomposition a supported 
on r with 

|r|<i(i + M _1 ), (i.i3) 

then the minimizer of (Pi) is unique and is equal to the minimizer of (Po) ([12] improves 
the constant in (1.13) from 1/2 to ~ .9142). In these situations, we can replace the highly 
non-convex program (Po) with the much tamer (and convex) (Pi). 

We now review a few applications of these types of ideas. 



• Geometric Separation. Suppose we have a dataset and one wishes to separate point- 
like structures, from filamentary (edge-like) structures, from sheet-like structures. In 
2 dimensions, for example, we might imagine synthesizing a signal as a superposition 
of wavelets and curvelets which are ideally adapted to represent point-like and curve- 
like structures respectively. Delicate space/orientation uncertainty principles show 
that the minimum ^i-norm decomposition in this combined dictionary automatically 
separates point and curve-singularities; the wavelet component in the decomposition 
(1.12) accurately captures all the pointwise singularities, while the curvelet component 
captures all the edge curves. We refer to [8] for theoretical developments and to [22] 
for numerical experiments. 

• Texture-edges separation Suppose now that we have an image we wish to decompose 
as a sum of a cartoon-like geometric part plus a texture part. The idea again is to use 
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curvelets to represent the geometric part of the image and local cosines to represent 
the texture part. These ideas have recently been tested in practical settings, with 
spectacular success [23] (see also [19] for earlier and related ideas). 

In a different direction, the QRUP is also implicit in some of our own work on the exact 
reconstruction of sparse signals from vastly undersampled frequency information [3]. Here, 
we wish to reconstruct a signal / £ from the data of only |0| random frequency 
samples. The surprising result is that although most of the information is missing, one can 
still reconstruct / exactly provided that / is sparse. Suppose obeys the oversampling 
relation 

\Q\ x \T\ - log TV 

with T := supp/. Then with overwhelming probability, the object / (digital image, signal, 
and so on) is the exact and unique solution of the convex program that searches, among all 
signals that are consistent with the data, for that with minimum ^-norm. We will draw 
on the the tools developed in the earlier work, making the QRUP explicit and applying it 
to the problem of searching for sparse decompositions. 

1.5 Innovations 

Nearly all the existing literature on uncertainty relations and its consequences focuses on 
worst case scenarios, compare (1.2) and (1.13). What is new here is the development of 
probabilistic models which show that the performance of Basis Pursuit in an overwhelm- 
ingly large majority of situations is actually very different than that predicted by the overly 
"pessimistic" bounds (1.13). For the time-frequency dictionary, we will see that if a rep- 
resentation q (with spike locations T and sinusoidal frequencies Q) of a signal / exists 
with 

\T\ + \Q\ x N/y/\ogN, 

then a is the sparsest representation of / almost all of the time. If in addition, T and Q, 
satisfy 

|T| + x AT/logiV, (1.14) 

then a can be recovered by solving the convex program (-Pi). In fact, numerical simulations 
reported in section 6 suggest that (1.14) is far closer to the empirical behavior than (1.13), 
see also [9]. We show that similar results also hold for general pairs of bases <E>i, $2- 

As discussed earlier, there is by now a well-established machinery that allows turning un- 
certainty relations into statements about the ability to find sparse decompositions. We 
would like to point out that our results (1.14) are not an automatic consequence of the 
uncertainty relation (1.7) together with these existing ideas. Instead, our analysis relies on 
the study of eigenvalues of random matrices which, of course, is completely new. 

1.6 Organization of the paper 

In Section 2 we develop a probability model that shall be used throughout the paper to 
formulate our results. In Section 3, we will establish uncertainty relations such as (1.8). 
Sections 4 and 5 will prove uniqueness and equality of the (Pq) an d (Pi) programs. In the 
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case where the basis pair ($1, $2) is the time- frequency dictionary (Section 4), we will be 
very careful in calculating the constants appearing in the bounds. We will be somewhat less 
precise in the general case (Section 5), and will forgo explicit calculation of constants. We 
report on numerical experiments in Section 6 and close the paper with a short discussion 
(Section 7). 

2 A Probability Model for Ti,r 2 

To state our results precisely, we first need to specify a probabilistic model. We let I\ and 
I2 be two independent Bernoulli sequences with parameters px and pn respectively 

Ii(t) = 1 with probability pt 
72 (w) = 1 with probability pn 

where t, 10 = 0, . . . , N — 1, and define the support sets for the spikes and sinusoids (and in 
general for the bases 3>i and $2) as 

T = {t s.t. h(t) = l}, n = {u s.t. J 2 (w) = l}. (2-1) 

If both pt and po, are not too small, an application of the standard large deviations in- 
equality shows us that our model is approximately equivalent to sampling E|T| = px • N 
spike locations and E|S1| = p^ ■ N frequency locations uniformly at random. 

As we will see in the next section, the robust uncertainty principle holds — with overwhelm- 
ing probability — over sets T and O randomly sampled as above. Our estimates are quanti- 
tative and introduce sufficient conditions so that the probability of "failure" be arbitrarily 
small, i.e. less than 0(N^ 13 ) for some arbitrary (3 > 0. As a consequence, we will always 
assume that 

min(E|T|,E|fi|) > 4(/3 + 1) ■ log iV (2.2) 

as otherwise, one would have to consider situations in which T or Q (or both) are empty 
sets — a situation of rather limited interest. We also note that for px and pn as above, we 
have 

P(|T| > 2p T • TV) < N~P , (2.3) 
as this follows from the well-known large deviation bound [1] 

P(|T| > EITI + t) < exp ( — —J -] . 

Vl 1 11 V 2E|T| + 2£/3y 

Further, to establish sparse approximation bounds (section 4), we will also introduce a prob- 
ability model on the "active" coefficients. Given a pair (T, fi), we sample the coefficient 
vector {0(7), 7 G T} from a distribution with identically and independently distributed 
coordinates; we also impose that each a(j) be drawn from a continuous probability dis- 
tribution that is circularly symmetric in the complex plane; that is, the phase of is 
uniformly distributed on [0, 2ir). 
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3 Quantitative Robust Uncertainty Principles 



Equipped with the probability model (2.1), we now introduce our uncertainty relations. To 
state our result, we make use of the standard notation o(l) to indicate a numerical term 
tending to as N goes to infinity. 

Theorem 3.1 Assume the parameters in the model (2.1) obey 

2Vnr\ T W\ < E|T| + E|fi| < N (po/2 + o(l)), Po = -7614 (3.1) 

y/(P + 1) log AT 

(we will assume throughout the paper that (3 > 1 and N > 512) and let (T, Q) be a randomly 
sampled support pair. Then with probability at least 1 — 0(logiV • N~@); every signal f 
supported on T in the time domain has most of its energy in the frequency domain outside 
of (I 

II f 1 1 2 

|2 



||/-1q|| 2 <^; 



and likewise, every signal f supported on Q in the frequency domain has most of its energy 
in the time domain outside of T 

H/-lT|| 2 <^t 

As a result, it is impossible to find a signal f supported on T whose discrete Fourier trans- 
form f is supported on Q. For finite sample sizes N, we can select the parameters in (3.1) 
as 

.2660iV 

ET +EJ1 < 



V(/3 + l)logAT" 

To establish this result, we introduce (as in [3]) the \T\ x \T\ auxiliary matrix TLt 

H T (t,t') = i° . ,. , M *~* . (3.2) 

The following lemma effectively says that the eigenvalues of Ht are small compared to N. 
Lemma 3.1 Fix q in (0, 1) and suppose that 

PT+Pn< po- —=======, po = -7614. 

V(/3 + l)logiV 

T/ien the the matrix TLt obeys 

P (||Wt|| > qN) < (J3 + 1) log iV • iV^. 

Proof The Markov inequality gives 

■pH'ijn ||2 

P(||Wr|| > qN) < ^L^, foralln>l. (3.3) 
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Recall next that the Frobenius norm || • ||^ dominates the operator norm ||Wt|| < II^t||f- 
This fact allows to leverage results from [3] which derives bounds for the conditional ex- 
pectation E[||HJ||p | T] (where the expectation is over Q for a fixed T): 

nmf F \T] < (2n) ^ii±^^J n n \T\ n+1 p^N n . 

Our assumption about the size of pr+Pn assures that pn < .12 so that (l+v / 5) 2 /2(l— pn) < 
6, whence 

B[\\H^\\ 2 F \T] < 2n(6/e) n n n \T\ n+1 p^N n . (3.4) 
We will argue below that for n < (/3 + 1) logiV and pt obeying (2.2) 

E[|T| n+1 ] < 1.15 n+1 [E|T|] n+1 = 1.15 n+1 {pt N) n+1 . (3.5) 

Since px < -25, we established 

E||HtIIf < ( 6 x l-15/e) n n n+1 -p^p^N 2n+1 . 

Observe now that together with y/prPn < (pt +Pq)/2, this gives 

P(||H T || > qN) < ( PT + Pn \ e - n n n+1 N, Po = l/y/6 x 1.15 = .7614. (3.6) 
V PoQ J 

We now specialize (3.6) and take n = \{(3 + 1) log N~\ where \x~\ is the smallest integer 
greater or equal to x. Then if pt + pn obeys (3.1), 

P(||Wt|| > qN) < [((3 + 1) log N + 1]- N-P, (3.7) 

as claimed. ■ 

We now return to (3.5) and write \T\ as 

\T\ -E|T| 



\T\ = E\T\ ■ (1 + Y), Y 



E\T\ 
Then 

E|T| n+1 = (E|T|)" +1 • E(l + Y) n+1 < (E|T|) n+1 • E[exp((n + 1)Y)}. 

Observe that Y is a an affine function of a sum of independent Bernoulli random variables. 
Standard calculations then give 

/ A \ N 

E[exp(nY)]=e- n -^l + ^^^J , X = n/(Np T ). 

Recall the assumption (2.2) which implies A < 1/4 which in turn gives A _1 (e A — 1) — 1 < 
log 1.15. The claim follows. 

We would like to remark that (3.5) might be considerably improved when E|T| = pt • N 
is much larger than n since in that case, the binomial will have enhanced concentration 
around its mean. For example, 

E|T| n+1 < 2 • [E|T|] n+1 
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in the event where n < p- \Jpr N for some positive constant p that the above method allows 
to calculate explicitely. This would of course lead to improved constants in (3.1) and in the 
statement of Lemma 3.1. In this paper, we shall not pursue all these refinements as not to 
clutter the exposition. 

Proof of Theorem 3.1 Let / € be supported on T; as such, R^Rrf = f, where Rt 
is the restriction operator to T. Put Fqt = RqFR?- We have 

||/-ln||2=||^T/||2<||^T||-|l/l|2, 

and since, H-FqtII 2 = H-F^T^HI) ^ wm suffice to show that that with high probability, the 
largest eigenvalue of Fq T Fqt is less than 1/2. 

Using the definition of the auxiliary matrix in (3.2), it is not hard to verify the identity 
Fq T Fqt = + jjH-t- Suppose that px + Pn obeys the condition in Lemma 3.1; then 
except for a set of probability less than 0(log N • N~P), 

ffl<2 m <2^. - 7 =i== , and A||W T |I < 

and, therefore, 

ll^nrll < Q ■ ( 1 + t= == = ) = q(l + o(l)). (3.8) 

The theorem follows from taking q = 1/2 + o(l). For the statement about finite sample 
sizes, we observe that for (3 > 1 and N > 512, 2/ v / (/3 + 1) log TV < .567 and, therefore, 
\\F£ T F nT \\ < 1/2 provided that q < [2(1 + .567/90)]" 1 - This establishes the first part of the 
theorem. 

By symmetry of the discrete Fourier transform, the claim about the size of ||/ ■ 1t\\ for / 
supported on a random set is proven exactly in the same way. This concludes the proof 
of the theorem. ■ 



4 Robust UPs and Basis Pursuit: Spikes and Sinusoids 

As in [10,12,15], our uncertainty principles are directly applicable to finding sparse ap- 
proximations in redundant dictionaries. In this section, we look exclusively at the case of 
spikes and sinusoids. We will leverage Theorem 3.1 in two different ways: 

1. ^o-uniqueness: If / € has a decomposition a supported on TU 17 with \T\ + |fi| x 
(log N)~ 1 / 2 N, then with high probability, a is the sparsest representation of /. 

2. Equivalence of (P ) and (Pi): If \T\ + \Q\ x (logAO^iV, then (Pi) recovers a with 
overwhelmingly large probability. 
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4.1 £ -uniqueness 



To illustrate that it is possible to do much better than (1.10), we first consider the case in 
which N is a prime integer. Tao [25] derived the following exact, sharp discrete uncertainty- 
principle. 

Lemma 4.1 [25] Suppose that the sample size N is a prime integer. Then 

|supp/| + |supp/|>JV, VfeC N . 

Using Lemma 4.1, a strong ^-uniqueness result immediately follows: 

Corollary 4.1 Let T and ft be subsets of {0, . . . , N — 1} for N prime, and let a (with 
<3?a = f ) be a vector supported on V = T U fl such that 

\T\ + \tt\ < N/2. 

Then the solution to (Po) is unique and is equal to a. Conversely, there exist distinct vectors 
ao,ai obeying | suppao |, | suppai | < N/2 + 1 and = <&u\ . 

Proof As we have seen in the introduction, one direction is trivial. If «o + <5o is another 
decomposition, then 5q is of the form 5q ■= ($, —6). Lemma 4.1 gives \\5o\\e () > N and thus 
||a + <5 |k > \\6\\e - \\a\\e > N/2. Therefore, ||a + tfolk) > Il a ll4- 

For the converse, we know that since $ has rank at most N, we can find 5^0 with 
| supp<5| = iV+ 1 such that &5 = 0. (Note that it is of course possible to construct such <5's 
for any support of size greater than N). Consider a partition of supp<5 = Tq U Ti where Tq 
and T\ are two disjoint sets with |To| = N/2 + 1 and = N/2, say. The claim follows by 
taking qo = 5\r and a.\ = — 5\r 1 - ■ 

A slightly weaker statement addresses arbitrary sample sizes. 

Theorem 4.1 Let f = $a be a signal with support set T = T U and coefficients a 
sampled as in Section 2, and with parameters obeying (3.1). Then with probability at least 
1 — 0(logiV • N~ l3 ), the solution to (Pq) is unique and equal to a. 

To prove Theorem 4.1, we shall need the following lemma: 

Lemma 4.2 Suppose T and Q are fixed subsets of {0, . . . , N — 1}, put T = T U $7, and let 
$ r : = $R* be the N x (\T\ + matrix <£> r = (Rfr F*R* Q ) . Then 

|r| < 2N dim(Null($ r )) < ^y- 

Proof Obviously, 

dim(Null($ r )) = dim (Null ($^$ r )), 
and we then write the |T| x |T| matrix <E>p<l>r as 
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with Fqt the partial Fourier transform from T to il -Fqt := RqFRt- The dimension of 
the nullspace of $p$r is simply the number of eigenvalues of <£p<E>r that are zero. Put 

F nr\ an +w n*n _ f F nT F nr 



G:=/-^r=| FsiT -j, so that CO _ ^ „ ^ 

Letting Aj(-) denotes the jth largest eigenvalue of a matrix, observe that Aj(3>p3>r) = 
1 — Xj(G), and since G is symmetric 

Tt(G*G) = A?(G) + Al(G) + • • • + Af T | + | |(G). (4.1) 

We also have that Tr (G*G) = Tr(F^ T F nT ) + Tr(FQ T i^ r ), so the eigenvalues in (4.1) will 
appear in duplicate, 

A?(G) + Al(G) + • • • + Af T|+|Q| (G) = 2 • (A^i^) + • • • + A^i^nr))- (4.2) 
We calculate 



wen *eT 



and thus 



Tr(G -G) = Ml. (4.3) 

Observe now that for the null space of <I> r <I>r to have dimension K, at least K of the 
eigenvalues in (4.2) must have magnitude greater than or equal to 1. As a result 

Tr (G*G) < 2K dim(Null($f.$r)) < K. 

Using the fact that (a + b) > 4ab/(a + b) (arithmetic mean dominates geometric mean), we 
see that if \T\ + \Q\ < 2N, then 2\T\ • \Cl\/N < \T\ + \Q\ which implies (4.1) (and hence 
dim(Null($f$ r ))) is less than (\T\ + |0|)/2. ■ 

Proof of Theorem 4.1 We assume T is selected such that <3?r has full rank. This happens 
if ||F^ t Fqt|| < 1 and Theorem 3.1 states that this occurs with probability at least 1 — 
0(logiV ■ N~P). 

Given this T, the (continuous) probability distribution on the {0(7), 7 6 T} induces a 
continuous probability distribution on Range(<I>r)- We will show that for every T' with 

|r'| < |r| 

dim(Range($r') f~l Range(3> r )) < |T|. (4.4) 

As such, the set of signals in Range($r) that have expansions on a T' / T that are at least 
as sparse as their expansions on T is a finite union of subspaces of dimension strictly smaller 
than |T|. This set has measure zero as a subset of Range(<3?r), and hence the probability of 
observing such a signal is zero. 

To show (4.4), we may assume that <3?r' also has full rank, since if dim(Range( ( I ) r')) < l-^'l, 
then (4.4) is certainly true. For a set of coefficients a supported on T and a 1 supported on 
T' to have the same image under <£, <3?a = <&o/ (or equivalently <E>ri?ra = <&r'Rr' a '), two 
things must be true: 
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1. a and a' must agree on r n T' . This is a direct consequence of $r' being full rank 
(its columns are linearly independent). 

2. There is a 5 € Null($) such that a' = a + 5. Of course, 

5( 7 ) = 0, 7 e(rur') c . 

By item 1 above, we will also have 

<5(7) = 0, rnr'. 

Thus, su PP 5 c (r\r) u (r'\r). 

In light of these observations, we see that for dim(Range(<I>r') n Range(<I>r)) = we need 
that for every a supported on T, there is a S € Null(3>) that is supported on (r\r') U (r'\r) 
such that 

<5( 7 ) = -a(7) 7er\r'. 

In other words, we need 

dim(Null(Q (r \ r , )u(nr) )) > |r\r'| . 
However, Lemma 4.2 tells us 

irxr'i + |r'\ri 

dim(Null(Q (r \r')u(r'\r))) < 2 

< |r\r'|, 

since |r'| < |T|. Hence dim(Range(<I>r') H Range(<I > r)) < |T|, and the theorem follows. ■ 



4.2 Recovery via ^i-minimization 

The problem (Pq) is combinatorial and solving it directly is infeasible even for modest-sized 
signals. This is the reason why we consider instead, the convex relaxation (1.12). 

Theorem 4.2 Suppose f = <E>a is a random signal sampled as in Section 2 and with 
parameters obeying 

EITI + EWS ^-^ . (1/8 + .(!)). (4.5) 

Then with probability at least 1 — 0((logiV) • N~P), the solutions of (Pi) and {Pq) are 
identical and equal to a. 



In addition to being computationally tractable, there are analytical advantages which come 
with (-Pi), as our arguments will essentially rely on a strong duality result [2]. In fact, the 
next section shows that a is a unique minimizer of (Pi) if and only if there exists a "dual 
vector" S satisfying certain properties. Here, the crucial part of the analysis relies on the 
fact that "partial" Fourier matrices Fqt '■= RqFRt have very well-behaved eigenvalues, 
hence the connection with robust uncertainty principles. 
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4.2.1 ^-duality 



For a vector of coefficients a G C supported on T := T\ U T2, define the 'sign' vector 
sgna by (sgna)( 7 ) := a( 7 )/|a( 7 )| for 7 G T and (sgna)( 7 ) = otherwise. We say that 
is a dual vector associated to a if S obeys 

($*5)( 7 ) = (sgna)( 7 ) 7 £ T (4.6) 
|(d>*S)( 7 )| < 1 7 GT C . (4.7) 

With this notion, we introduce a strong duality result which is similar to that presented in 
[3], see also [14]. 

Lemma 4.3 Consider a vector a G C 2N with support T = T\ U ^ and put f = <£a. 

• Suppose that there exists a dual vector and that <3?r has full rank. Then the minimizer 
c$ to the problem (Pi) is unique and equal to a. 

• Conversely, if a is the unique minimizer of (-Pi), then there exists a dual vector. 

Proof The program dual to (Pi) is 

(Dl) max Re (S* f) subject to \\$* S\\ £oo < 1. (4.8) 
s 

It is a classical result in convex optimization that if a is a minimizer of (Pi), then Re(5*$a) < 
\\di\\e 1 for all feasible S. Since the primal is a convex functional subject only to equality con- 
straints, we will have Re(<S*<M) = if and only if S is a maximizer of (Dl) [2, Chap. 5]. 

First, suppose that $Pp has full rank and that a dual vector S exists. Set P = <J>*5. Then 

Re($a,5) = Re(a,$*S) 

N-l 

= Re^P(7H7) 

7=0 

= Re ^sgn 0(7)0(7) 
7er 

= \\ a \Ui 

and a is a minimizer of (Pi). Since |P(7)| < 1 for 7 G r c , all minimizers of (Pi) must be 
supported on T. But <J>i?p has full rank, so a is the unique minimizer. 

For the converse, suppose that a is the unique minimizer of (Pi). Then there exists at least 
one S such that with P = &*S, WPWe^. < 1 and S* f = WaW^. Then 

Hall^ = Re(<£a, S) 
= Re(a,$*S) 

= Re^P(7>(7)- 
Since |P( 7 )| < 1, equality above can only hold if P( 7 ) = sgna( 7 ) for 7 G T. 
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We will argue geometrically that for one of these S, we have |P(7)| < 1 for 7 G T c . Let V 
be the hyperplane {d G C 2N : <S>d = /}, and let B be the polytope B = {d £ C 2N : \\d\\ ei < 
Hall^}. Each of the S above corresponds to a hyperplane H$ = {d : Re(d, &*S) = } 
that contains V (since Re(/, S) = IMI^) and which defines a halfspace {d : Re(d, &*S) < 
1} that contains B (and for each such hyperplane, a S exists that describes it as such). 
Since a is the unique minimizer, for one of these S' , the hyperplane Hg/ intersects B only 
on the minimal facet {d : suppd C T}, and we will have P(7) < 1, 7 € T c . ■ 

Thus to show that (Pi) recovers a representation a from a signal observation <&a, it is 
enough to prove that a dual vector with properties (4.6)-(4.7) exists. 

As a sufficient condition for the equivalence of (Pq) and (Pi), we construct the minimum 
energy dual vector 

min ||P||2, subject to P G Range(<£*) and P(7) = sgn(a)(7), V7 G T. 

This minimum energy vector is somehow "small," and we hope that it obeys the inequality 
constraints (4.7). Note that ||P||2 = 2 HSU 2, and the problem is thus the same as finding 
that S G with minimum norm and obeying the constraint above; the solution is classical 
and given by 

S = <I> r ($p$ r )~ 1 P r sgna 

where again, Rr is the restriction operators to 17 Setting P = we need to establish 
that 

1. <l>p<I>r is invertible (so that S exists), and if so 

2. |P( 7 )| < 1 for 7 G T c . 

The next section shows that for \T\ + \Q\ x N/ log N, not only is ^p'&r invertible with high 
probability but in addition, the eigenvalues of (^p^r)" 1 are all less than two, say. These 
size estimates will be very useful to show that P is small componentwise. 

4.2.2 Invertibility 

Lemma 4.4 Fix > 1 and the parameters as in (4.5). Then the matrix <I> r <3?r is invertible 
and obeys 

IK^r)" 1 !! = l + o(l). 

with probability exceeding 1 — 0(logiV • N~@). 

Proof We begin by recalling that with Fqt as before, $ r <I>r is given by 

Clearly, ||($^ r ) _1 || = 1/A min (^$r) and since A min ($f$ r ) > 1 - ^\\F^ T F nT \\, we have 

IKSf-Sr)- 1 !! < = „ • 
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We then need to prove that ||-Fq T -Fqt|| = o(l) with the required probability. This follows 
from the conclusion of Lemma 3.1 which (4.5) alows to specialize to the value 1/q = 
8po^/((3 + l)logN. Note that this gives more than what is claimed since 

H^r)- 1 !! < 1 + ] + 0(l/logiV). 

8po V(f3 + l)logN 



Remark. Note that Lemma 3.1 assures us that it is sufficient to take E|T| + E|fi| of the 
order of N/ ^/logiV (rather than of the order of N/ log N as the Theorem states) and still 
have invertibility with IK^p^r) -1 )! <■ 2, say. The reason why we actually need the stronger 
condition will become apparent in the next subsection. 

4.2.3 Proof of Theorem 4.2 

To prove our theorem, it remains to show that, with high probability, |P(7)| < 1 on T c . 

Lemma 4.5 Under the hypotheses of Theorem 4-2, for each 7 £ T c 

P(|-P(7)| > 1) < 4iV-( /3+1 ). 

As a result, 




Proof The image of the dual vector P is given by 

P:= (SS) =^^r(^r^r)~ 1 i?rsgna, 
where the matrix $*$r may be expanded in the time and frequency subdomains as 



T? J?* E?* 



Consider first Pi(t) for t G T c and let Vt £ C' r ' be the conjugate transpose of the row of 
the matrix {ffip F*Rq) corresponding to index t. For t € T c , the row of Wp with index t 
is zero, and Vt is then the (\T\ + -dimensional vector 

vt= ({^-".-^})' 

These notations permit to express Pi(t) as the inner product 

Pi(t) = ((^r^Rrsgn^Vt) 
= (flrsgna.^Sr)" 1 ^) 
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where W = (^p^r)™ 1 ^- The signs of a on T are statistically independent of T (and hence 
of W) and, therefore, for a fixed support set T, Pi(t) is a weighted sum of independent 
complex- valued random variables 

with EXy = and |X 7 | < |W(7)|. Applying the complex Hoeffding inequality (see the 
Appendix) gives a bound on the conditional ditribution of P(t) 

P(|fi(i)l > 1 I r ) < 4ex P 



4||W||| 



Thus, it suffices to develop a bound on the magnitude of the vector W. Controlling the 
eigenvalues of ||($p^>r) _1 || is essential here, as 

\\W\\ < IK^r) -1 !! • INI- (4.9) 

On the one hand, \\V t \\ = ^/WJN and as we have seen, size estimates about \Vt\ give 
||Vt|| < \/2{pt + Pn) with the desired probability. On the other hand, we have also seen 
that IK^p^r) -1 )! < 1 + o(l) — also with the desired probability — and, therefore, 

\\W\\ 2 <2-(l + o(l)))-(pr+pn). 

This gives 

P(\Pi(t)\ > 1) <4exp 
Select pr + pn as in (4.5). Then 



8(pr + Pn)(l + o(l)), 
P(\Pi(t)\ > 1) < 4exp(-(/? + l)logiV) < 4^ (/3+1) 



and 



P ( maxlPimi > 1 ) < AN' 



As we alluded earlier, the bound about the size of each individual P(t) one would obtain 
assuming that E|T| + E|0| be only of the order N/y/\og N would not allow taking the 
supremum via the standard union bound. Our approach requires E|T| + E|fi| to be of the 
order N/ log N. 

By the symmetry of the Fourier transform, the same is true for i-^^)- This finishes the 
proof of Lemma and 4.5 and of Theorem 4.2. ■ 



5 Robust UPs and Basis Pursuit 



The results of Sections 3 and 4 extend to the general situation where the dictionary $ is a 
union of two orthonormal bases $i,$2- In this section, we present results for pairs of or- 
thogonal bases that parallel those for the time-frequency dictionary presented in Sections 3 
and 4. The bounds will depend critically on the degree of similarity of $i and $2, which we 
measure using the the mutual incoherence defined in (1.4), fx := //(3>i, $2)- As we will see, 
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our generalization introduces additional "logiV" factors. It is our conjecture that bounds 
that do not include these factors exist. 

As before, the key result is the quantitative robust uncertainty principle. We use the same 
probabilistic setup to sample the support sets Ti, T2 in the <&i and $2 domains respectively. 
The statement below is the analogue of Theorem 3.1. 



Theorem 5.1 Let 3> := ($1 $2) be a dictionary composed of a union of two orthonormal 
bases with mutual incoherence (i. Suppose the sampling parameters obey 

E|r ' l+E|r2l SM(^)i° g A0^ <»> 



for some positive constant C\ > 0. Assume fj, < l/y / 2(/3 + 1) log N. Then with probability 
at least 1-OQ.ogN -N't 3 ), every signal f with $1/ supported onT\ has most of its energy 
in the ^2-domain outside q/TV 



11*2/ -i ra r< 11/1172, 

and vice versa. As a result, for nearly all pairs (^i , ^2) with sizes obeying (5.1), it is 
impossible to find a signal f supported on T\ in the ^i-domain and T2 in the ^2-domain. 



We would like to re-emphasize the significant difference between these results and (1.5). 
Namely, (5.1) effectively squares the size of the joint support since, ignoring log- like factors, 
the factor 1/fj, is replaced by I //J, 2 . For example, in the case where the two bases are 
maximally incoherent, i.e. /i = 1/y/N, our condition says that it is nearly impossible to 
concentrate a function in both domains simultaneously unless (again, up to logarithmic 
factors) 

|ri| + |T 2 | ~ N, 
which needs to be compared with (1.5) 

|ri| + |r 2 | ^2^. 

For mutual incoherences scaling like a power-law n ~ iV~ 7 , our condition essentially reads 
+ |r 2 | ~ iV 27 compared to |r*i| + |r 2 | ~ iV 7 . 

The proof of Theorem 5.1 directly parallels that of Theorem 3.1, with A := Rt 2 ^* 2 ^iRy 1 
playing the role of the partial Fourier transform from T to SI. Our argument calls for 
bounds on the eigenvalues of the random matrix A* A which we write as the sum of two 
terms; a diagonal and an off-diagonal term 

A*A = D + Hi. 

We use large deviation theory to control the norm of D while bounds on the size of Hi 
are obtained by using moment estimates. This calculation involves estimates about the 
expected value of the Frobenius norm of large powers of A* A and is very delicate. We do 
not reproduce all these arguments here (this is the scope of a whole separate article) and 
simply state a result which is proved in [4] 

P(||AM|| > 1/2) < C-logN-N- 13 (5.2) 
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for E|Ti | +E|r2 1 obeying (5.1) (here C is some universal positive constant). Now for (5.2) to 
hold, we also need that the incoherence be not too large and obeys fi > l/y / 2(/? + 1) log N 
which is the additional condition stated in the hypothesis. The idea that fi cannot be too 
large is somewhat natural as otherwise for fi = 1, say, the two bases would share at least 
one element and we would have ||A*^4|| = 1 as soon as T\ and F 2 would contain a common 
element. As we have seen in section 3, the size estimate (5.2) would then establish the 
theorem. 

The generalized ^o-uniqueness result follows directly from Theorem 5.1: 



Theorem 5.2 Let f = $a be an observed signal sampled as in Section 2, and with param- 
eters obeying 

. . . , C 2 
E Ti + E To < t. 

11 1 ' - /i 2 -((/3 + l)logiV) 5 /2 



Assume /j, < l/y / 2(/3 + 1) log N. Then with probability 1 — 0(log N ■ N the solution to 
(Po) is unique and equal to a. 



The only change to the proof presented in Section 4.1 is in the analogue to Lemma 4.2: 



Lemma 5.1 Let F±, F 2 be fixed subsets of {0, . . . , N — I}, let F = Fi U F2, and let Qr be 

the N x |T| matrix 

Q r = (<S> 1 R* i 3> 2 R*r 2 ). 

If\F\ < 2/fi 2 , then 

dim(Null(Q r )) < ^. 



The proof of Lemma 5.1 has exactly the same structure as the proof of Lemma 4.2. The 
only modification comes in calculating the trace of G*G; here each term can be bounded 
by n 2 , and we have Tr(G*G) < 2(\Ti\ ■ \F 2 \)fi 2 . Lemma 5.1 follows. 

The conditions for the equivalence of (Pq) and (Pi) can also be generalized. 



Theorem 5.3 Let f = $a be a random signal generated as in Section 2 with 

C 3 



ElTil +E|r 2 | < 



/i 2 • ((/? + l)logiV) 5 /2' 



Assume fx < 1/ \j2(j3 + 1) log N. Then with probability 1 — 0(log N ■ N /3 ), the solutions of 
(Po) and (Pi) are identical and equal to a. 



The proof of Theorem 5.3 is again almost exactly the same as that we have already seen. 
Using Theorem 5.1, the eigenvalues of (^p^r) 1 are controlled, allowing us to construct a 
dual vector meeting the conditions (4.6) and (4.7) of Section 4.2.1. Note that the (logiV) 5 / 2 
term in the denominator means that that P(|P(7)| < 1), 7 G F c goes to zero at a much 
faster speed than a negative power of N, it decays as exp(— p(log N) 5 ) for some positive 
constant p > 0. 
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6 Numerical Experiments 



From a practical standpoint, the ability of (Pi) to recover sparse decompositions is nothing 
short of amazing. To illustrate this fact, we consider a 256 point signal composed of 60 
spikes and 60 sinusoids; \T\ + |fi| « N/2, see Figure 1. Solving (Pi) recovers the original 
decomposition exactly. 

We then empirically validate the previous numerical result by repeating the experiment for 
various signals and sample sizes, see Figure 2. These experiments were designed as follows: 

1. set iVr as a percentage of the signal length N; 

2. select a support set T = T U $7 of size |T| = N? uniformly at random; 

3. sample a vector a on T with independent and identically distributed Gaussian entries 1 ; 

4. make / = &a; 

5. solve (Pi) and obtain a; 

6. compare a to d; 

7. repeat 100 times for each iVr; 

8. repeat for signal lengths N = 256, 512, 1024. 

Figure 2(a) shows that we are numerically able to recover "sparse" superpositions of spikes 
and sinusoids when \T\ + \Q\ is close to N/2, at least for this range of sample sizes N (we 
use the quotations since decompositions of this order can hardly be considered sparse). 
Figure 2(b) plots the success rate of the sufficient condition for the recovery of the sparsest 
a developed in Section 4.2.1 (i.e. the minimum energy signal is a dual vector). Numerically, 
the sufficient condition holds when \T\ + \Q\ is close to N/5. 

The time- frequency dictionary is special in that it is maximally incoherent (// = 1). But as 
suggested in [10], incoherence between two bases is the rule, rather than the exception. To 
illustrate this, the above experiment was repeated for N = 256 with a dictionary that is a 
union of the spike basis and of an orthobasis sampled uniformly at random (think about 
orthogonalizing N vectors sampled independently and uniformly on the unit-sphere of C^). 
As shown in Figure 3, the results are very close to those obtained with time-frequency 
dictionaries; we recover "sparse" decompositions of size about |Ti| + |T2| < 0.4 • N. 

7 Discussion 

In this paper, we have demonstrated that except for a negligible fraction of pairs (T, Q), 
the behavior of the discrete uncertainty relation is very different from what worst case 
scenarios — which have been the focus of the literature thus far — suggest. We introduced 
probability models and a robust uncertainty principle showing that for for nearly all pairs 

lr The results presented here do not seem to depend on the actual distribution used to sample the coeffi- 
cients. 
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a 



spike component 



sinusoidal component 





+ 




(a) 

Figure 1: Recovery of a "sparse" decomposition, (a) Magnitudes of a randomly generated coefficient 
vector a with 120 nonzero components. The spike components are on the left (indices 1-256) and 
the sinusoids are on the right (indices 257-512). The spike magnitudes are made small compared 
to the magnitudes of the sinusoids for effect; we cannot locate the spikes by inspection from the 
observed signal f, whose real part is shown in (b). Solving (Pi) separates f into its spike (c) and 
sinusoidal components (d) (the real parts are plotted). 



^i-recovery sufficient condition 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



(\T\ + \n\)/N (\T\ + \n\)/N 

Figure 2: ti-recovery for the time-frequency dictionary, (a) Success rate of (Pi) in recovering 
the sparsest decomposition versus the number of nonzero terms, (b) Success rate of the sufficient 
condition (the minimum energy signal is a dual vector). 
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l\ recovery sufficient condition 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



(|r| + |n|)/jv {\T\ + \n\)/N 

Figure 3: £\-recovery for the spike-random dictionary, (a) Success rate of (Pi) in recovering the 
sparsest decomposition versus the number of nonzero terms, (b) Success rate of the sufficient 
condition. 



(T, Q), it is actually impossible to concentrate a discrete signal on T and O, simultaneously 
unless the size of the joint support \T\ + |fi| be at least of the order of iV/ydog N. We 
derived significant consequences of this new uncertainty principle, showing how one can 
recover sparse decompositions by solving simple convex programs. 

Our sampling models were selected in perhaps the most natural way, giving to each time 
point and to each frequency point the same chance of being sampled, independently of 
the others. Now there is little doubt that conclusions similar to those derived in this pa- 
per would hold for other probability models. In fact, our analysis develops a machinery 
amenable to other setups. The centerpiece is the study of the singular values of partial 
Fourier transforms. For other sampling models such as models biased toward low or high 
ferequencies for example, one would need to develop analogues of Lemma 3.1. Our machin- 
ery would then nearly automatically transforms these new estimates into corresponding 
claims. 

In conclusion, we would like to mention areas for possible improvement and refinement. 
First, although we have made an effort to obtain explicit constants in all our statements 
(with the exception of section 5), there is little doubt that a much more sophisticated 
analysis would yield better estimates for the singular values of partial Fourier transforms, 
and thus provide better constants. Another important question we shall leave for future 
research, is whether the l/\/TogN factor in the QRUP (Theorem 3.1) and the 1/logiV 
for the exact ^-reconstruction (Theorem 4.2) are necessary. Finally, we already argued 
that one really needs to randomly sample the support to derive our results but we wonder 
whether one needs to assume that the signs of the coefficients a (in / = <J>a) need to be 
randomized as well. Or would it be possible to show analogs of Theorem 4.2 (£i recovers 
the sparsest decomposition) for all a, provided that the support of a may not be too large 
(and randomly selected)? Recent work [5,6] suggests that this might be possible — at the 
expense of additional logarithmic factors. 
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8 Appendix: Concentration-of-Measure Inequalities 



The Hoeffding inequality is a well-known large deviation bound for sums of independent 
random variables. For a proof and interesting discussion, see [18]. 



Lemma 8.1 (Hoeffding inequality) Let Xq, . . . , Xjv-i be independent real-valued random 



variables such that EX,- = and \Xj\ 



< a,j for some positive real numbers aj . For e > 



JV-l 
3=0 



> e < 2 exp 



2||a||l 



where llal 



Lemma 8.2 (complex Hoeffding inequality) Let Xq,...,Xn-i be independent complex- 



valued random variables such that EJ, 



and \Xj\ < aj. 



Then for e > 





N—l 




•( 


E^ 






3=0 





< 4 exp 



4||a||2 



Proof Separate the Xj into their real and imaginary parts; Xj = HeXj, Xj = ImXj. 
Clearly, \Xj\ < a,j and \X'j\ < aj. The result follows immediately from Lemma 8.1 and the 
fact that 



JV-l 

E^ 

3=0 



> e < P 



N-l 
3=0 



> e/V2 + P 



JV-l 
3=0 



> e/V2 
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